准确而健壮的本地化是移动代理的基本需求。视觉惯性进程(VIO)算法将信息从摄像机和惯性传感器中利用到估计位置和翻译。最近基于深度学习的VIO模型以数据驱动的方式提供姿势信息,而无需设计手工制作的算法,因此吸引了注意力。现有的基于学习的VIO模型依赖于经常性模型来融合多模式数据和过程传感器信号,这些模型很难训练并且不够有效。我们提出了一个基于学习的新型VIO框架,并有效地结合了视觉和惯性特征,以供各州估计。我们提出的模型也能够准确,稳健地估计,即使在具有挑战性的情况下,例如在阴天和充满水的地面上,对于传统的Vio算法而言,这很难提取视觉特征。实验验证了它在不同场景中的表现优于传统和基于学习的VIO基线。
translated by 谷歌翻译
在语义细分中,将高级上下文信息与低级详细信息集成至关重要。为此,大多数现有的分割模型都采用双线性启动采样和卷积来具有不同尺度的地图,然后以相同的分辨率对齐。但是,双线性启动采样模糊了这些特征地图和卷积中所学到的精确信息,这会产生额外的计算成本。为了解决这些问题,我们提出了隐式特征对齐函数(IFA)。我们的方法的灵感来自隐式神经表示的快速扩展的主题,在该主题中,基于坐标的神经网络用于指定信号字段。在IFA中,特征向量被视为表示2D信息字段。给定查询坐标,附近的具有相对坐标的特征向量是从多级特征图中获取的,然后馈入MLP以生成相应的输出。因此,IFA隐含地将特征图在不同级别对齐,并能够在任意分辨率中产生分割图。我们证明了IFA在多个数据集上的功效,包括CityScapes,Pascal环境和ADE20K。我们的方法可以与各种体系结构的改进结合使用,并在共同基准上实现最新的计算准确性权衡。代码将在https://github.com/hzhupku/ifa上提供。
translated by 谷歌翻译
图形神经网络(GNNS)最流行的设计范例是1跳消息传递 - 反复反复从1跳邻居聚集特征。但是,1-HOP消息传递的表达能力受Weisfeiler-Lehman(1-WL)测试的界定。最近,研究人员通过同时从节点的K-Hop邻居汇总信息传递到K-HOP消息。但是,尚无分析K-Hop消息传递的表达能力的工作。在这项工作中,我们从理论上表征了K-Hop消息传递的表达力。具体而言,我们首先正式区分了两种k-hop消息传递的内核,它们在以前的作品中经常被滥用。然后,我们通过表明它比1-Hop消息传递更强大,从而表征了K-Hop消息传递的表现力。尽管具有较高的表达能力,但我们表明K-Hop消息传递仍然无法区分一些简单的常规图。为了进一步增强其表现力,我们引入了KP-GNN框架,该框架通过利用每个跳跃中的外围子图信息来改善K-HOP消息。我们证明,KP-GNN可以区分几乎所有常规图,包括一些距离常规图,这些图无法通过以前的距离编码方法来区分。实验结果验证了KP-GNN的表达能力和有效性。 KP-GNN在所有基准数据集中都取得了竞争成果。
translated by 谷歌翻译
在复杂的协调问题中,深层合作多智能经纪增强学习(Marl)的高效探索仍然依然存在挑战。在本文中,我们介绍了一种具有奇妙驱动的探索的新型情节多功能钢筋学习,称为EMC。我们利用对流行分解的MARL算法的洞察力“诱导的”个体Q值,即用于本地执行的单个实用程序功能,是本地动作观察历史的嵌入,并且可以捕获因奖励而捕获代理之间的相互作用在集中培训期间的反向化。因此,我们使用单独的Q值的预测误差作为协调勘探的内在奖励,利用集肠内存来利用探索的信息经验来提高政策培训。随着代理商的个人Q值函数的动态捕获了国家的新颖性和其他代理人的影响,我们的内在奖励可以促使对新或有前途的国家的协调探索。我们通过教学实例说明了我们的方法的优势,并展示了在星际争霸II微互动基准中挑战任务的最先进的MARL基础上的其显着优势。
translated by 谷歌翻译
Graph Neural Networks (GNNs), originally proposed for node classification, have also motivated many recent works on edge prediction (a.k.a., link prediction). However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i.e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes. To this end, we propose a novel edge prediction paradigm named Edge-aware Message PassIng neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting technique to specify use of each edge where each edge is solely used as either the topology or the supervision (named as topology edge or supervision edge). We then develop a new message passing mechanism that generates the messages to source nodes (through topology edges) being aware of target nodes (through supervision edges). In order to emphasize the differences between pairs connected by supervision edges and pairs unconnected, we further weight the messages to highlight the relative ones that can reflect the differences. In addition, we design a novel negative node-pair sampling trick that efficiently samples 'hard' negative instances in the supervision instances, and can significantly improve the performance. Experimental results verify that the proposed method can significantly outperform existing state-of-the-art models regarding the edge prediction task on multiple homogeneous and heterogeneous graph datasets.
translated by 谷歌翻译
There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require high-resolution outputs, for example, our GPViT-L3 outperforms Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Code and pre-trained models are available at https://github.com/ChenhongyiYang/GPViT .
translated by 谷歌翻译
In recent years, the number of parameters of one deep learning (DL) model has been growing much faster than the growth of GPU memory space. People who are inaccessible to a large number of GPUs resort to heterogeneous training systems for storing model parameters in CPU memory. Existing heterogeneous systems are based on parallelization plans in the scope of the whole model. They apply a consistent parallel training method for all the operators in the computation. Therefore, engineers need to pay a huge effort to incorporate a new type of model parallelism and patch its compatibility with other parallelisms. For example, Mixture-of-Experts (MoE) is still incompatible with ZeRO-3 in Deepspeed. Also, current systems face efficiency problems on small scale, since they are designed and tuned for large-scale training. In this paper, we propose Elixir, a new parallel heterogeneous training system, which is designed for efficiency and flexibility. Elixir utilizes memory resources and computing resources of both GPU and CPU. For flexibility, Elixir generates parallelization plans in the granularity of operators. Any new type of model parallelism can be incorporated by assigning a parallel pattern to the operator. For efficiency, Elixir implements a hierarchical distributed memory management scheme to accelerate inter-GPU communications and CPU-GPU data transmissions. As a result, Elixir can train a 30B OPT model on an A100 with 40GB CUDA memory, meanwhile reaching 84% efficiency of Pytorch GPU training. With its super-linear scalability, the training efficiency becomes the same as Pytorch GPU training on multiple GPUs. Also, large MoE models can be trained 5.3x faster than dense models of the same size. Now Elixir is integrated into ColossalAI and is available on its main branch.
translated by 谷歌翻译
Node classification for graph-structured data aims to classify nodes whose labels are unknown. While studies on static graphs are prevalent, few studies have focused on dynamic graph node classification. Node classification on dynamic graphs is challenging for two reasons. First, the model needs to capture both structural and temporal information, particularly on dynamic graphs with a long history and require large receptive fields. Second, model scalability becomes a significant concern as the size of the dynamic graph increases. To address these problems, we propose the Time Augmented Dynamic Graph Neural Network (TADGNN) framework. TADGNN consists of two modules: 1) a time augmentation module that captures the temporal evolution of nodes across time structurally, creating a time-augmented spatio-temporal graph, and 2) an information propagation module that learns the dynamic representations for each node across time using the constructed time-augmented graph. We perform node classification experiments on four dynamic graph benchmarks. Experimental results demonstrate that TADGNN framework outperforms several static and dynamic state-of-the-art (SOTA) GNN models while demonstrating superior scalability. We also conduct theoretical and empirical analyses to validate the efficiency of the proposed method. Our code is available at https://sites.google.com/view/tadgnn.
translated by 谷歌翻译
Automated identification of myocardial scar from late gadolinium enhancement cardiac magnetic resonance images (LGE-CMR) is limited by image noise and artifacts such as those related to motion and partial volume effect. This paper presents a novel joint deep learning (JDL) framework that improves such tasks by utilizing simultaneously learned myocardium segmentations to eliminate negative effects from non-region-of-interest areas. In contrast to previous approaches treating scar detection and myocardium segmentation as separate or parallel tasks, our proposed method introduces a message passing module where the information of myocardium segmentation is directly passed to guide scar detectors. This newly designed network will efficiently exploit joint information from the two related tasks and use all available sources of myocardium segmentation to benefit scar identification. We demonstrate the effectiveness of JDL on LGE-CMR images for automated left ventricular (LV) scar detection, with great potential to improve risk prediction in patients with both ischemic and non-ischemic heart disease and to improve response rates to cardiac resynchronization therapy (CRT) for heart failure patients. Experimental results show that our proposed approach outperforms multiple state-of-the-art methods, including commonly used two-step segmentation-classification networks, and multitask learning schemes where subtasks are indirectly interacted.
translated by 谷歌翻译